About our data:
Our YouTube data set has 161,470 records and 17 variables. The variables in this dataset are: “video_id”, “trending_date”, “title,”channel_title“,”category_id" “publish_date”, “time_frame”, “published_day_of_week”, “publish_country”, “tags”, “views”, “likes”, “dislikes”, “comment_count”, “comments_disabled”, “ratings_disabled”, and “video_error_or_removed”. The variables have various data types that we will be using in our analysis, including character, integer, boolean, and factors. Each row in the data set is a particular trending video on a specific trending date. Additionally, four separate countries are analyzed, the United States, Canada, Great Britain, and France. Each country has its own list of trending videos on each day. The trending videos are taken from November of 2017 to June of 2018. According to Google (the owner of YouTube), the trending list is updated approximately every 15 minutes (Citation Needed). Thus, the number of videos that are trending throughout a day fluctuates. The number of videos on the trending list at any given time is around 200 in each country.
Our questions focus on five main areas of focus:
For the most part, our data set is quite user friendly. When loading the data, R automatically assigns certain data types. However, some of the automatic data types assigned are not helpful for future analysis and were changed.
We also had to clean the category_id variable. The raw data assigns a number to each category. We researched what categories these numbers corresponded to and relabeled the data using the category names. Using Youtube’s API (https://gist.github.com/dgp/1b24bf2961521bd75d6c), we relabeled the numbers to factors.
Our data consists of 18 different video categories. These categories are broad and range from “Pets and Animals” to “Music” to “News and Politics”. The graph below shows the number of trending videos in each category in our data.
Thus, we can see that “Entertainment” videos are the most frequent type of videos that appear on the trending list. It is important to note that each video can only be listed under one category.
We can also see how the number of trending videos in each category changes over time as well. Our data consists of trending videos from November of 2017 through June of 2018. Figure 2 below shows the number of trending videos over time in the categories “Entertainment,” “Music,” “People and Blogs,” “Comedy,” and “News and Politics”. These categories are the top five categories with the most trending videos as shown above. We aim to answer the following question:
How has the number of trending videos for different categories change over time?
In analyzing the above graph, we see that the number of trending videos for “Entertainment” stays relatively constant over time. For “Music” videos, we see an increase starting around March and this increase continues into May. The increase in “Music” makes sense as many artists release music during this time (so the song can become popular before summertime, but is still considered new). The categories of “Comedy,” “People and Blogs,” and “News and Politics” are constant throughout until “Music” begins to make its increase. During “Music”’s increase, these three categories decrease.
## Selecting by total_views
## Selecting by total_comments
## # A tibble: 2 x 2
## category_id rate
## <fct> <dbl>
## 1 Entertainment 0.00385
## 2 Music 0.00216
## # A tibble: 10 x 2
## trending_date nbr_trending_videos
## <date> <int>
## 1 2018-04-01 780
## 2 2018-04-02 780
## 3 2018-04-03 789
## 4 2018-04-04 786
## 5 2018-04-05 792
## 6 2018-04-06 798
## 7 2018-04-07 794
## 8 2018-04-14 790
## 9 2018-04-15 785
## 10 2018-04-16 781
## # A tibble: 4 x 2
## publish_country mean_videos
## <chr> <dbl>
## 1 CANADA 199.
## 2 FRANCE 199.
## 3 GB 190.
## 4 US 200.
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.